Skip to content

Comments

Add Python SDK with integration tests#6

Merged
mgoldsborough merged 8 commits intomainfrom
feature/python-sdk
Feb 20, 2026
Merged

Add Python SDK with integration tests#6
mgoldsborough merged 8 commits intomainfrom
feature/python-sdk

Conversation

@mgoldsborough
Copy link
Contributor

Summary

  • Add mpak Python SDK (packages/sdk-python/) for searching, resolving, and downloading MCPB bundles from the mpak registry
  • Fix Accept: application/json header so get_bundle_download() receives JSON metadata instead of following a 302 redirect to binary
  • Fix architecture mapping: x86_64 now maps to x64 (matching Node.js/registry convention) instead of amd64
  • Add 8 integration tests (@pytest.mark.integration) that hit the live registry to verify resolve, download, SHA256 verification, and extraction end-to-end

Test plan

  • pytest -m "not integration" — 12 unit tests pass
  • pytest -m integration — 8 integration tests pass against live registry.mpak.dev
  • ruff check . — clean
  • ruff format --check . — clean

Scaffolds packages/sdk-python/ with:
- Pydantic models generated from live registry OpenAPI spec
  (https://registry.mpak.dev/docs/json via datamodel-code-generator)
- scripts/generate-types.py to regenerate types from spec
- MpakClient with get_bundle_download, load_bundle, load_bundle_from_url
- mpak-loader CLI (drop-in replacement for mcpb-loader.py)
- Error hierarchy matching TypeScript SDK (MpakError, MpakNotFoundError,
  MpakIntegrityError, MpakNetworkError)
- Platform detection and SHA256 fail-closed integrity verification
- 12 unit tests passing, ruff lint clean
- Dependencies: httpx, pydantic (no email-validator)
Fix two bugs that prevented the SDK from working against the live
registry: add Accept: application/json header so the download endpoint
returns JSON metadata instead of a 302 redirect, and correct the
architecture mapping from amd64 to x64 to match the registry convention.

Add 8 integration tests (marked @pytest.mark.integration) that hit
the live registry.mpak.dev to verify resolve, download, SHA256
verification, and extraction work end-to-end.
@mgoldsborough
Copy link
Contributor Author

Requesting review from @shwetank-dev and @JoeCardoso13 — please accept the repo invite first, then take a look.

Copy link
Collaborator

@JoeCardoso13 JoeCardoso13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Thanks for this — the SDK is well-structured overall: clean error hierarchy, good use of httpx, sensible config, and solid integration tests. That said, there are a few issues that need addressing before merge, including one security concern and a bug.


BUG-1 — NameError in loader.py if MpakClient() raises

File: packages/sdk-python/src/mpak/loader.py

client is assigned inside the try block. If MpakClient() itself raises, client is never bound. The except Exception handler then calls sys.exit(1), which raises SystemExit, triggering the finally block — which calls client.close() and throws NameError on the unbound name.

try:
    client = MpakClient()   # if THIS raises...
    ...
except Exception as e:
    sys.exit(1)
finally:
    client.close()          # NameError — client is unbound

Fix: Initialize client = None before the try and guard the close with if client: client.close().


SECURITY-1 — Zip slip vulnerability in client.py:load_bundle_from_url

File: packages/sdk-python/src/mpak/client.py

zipfile.ZipFile.extractall() does not sanitize entry paths. A zip containing entries like ../../etc/cron.d/evil will write files outside of dest_path. SHA256 verification confirms the zip hasn't been tampered with in transit, but does not protect against a malicious zip served from a compromised registry or CDN with a valid pre-computed hash.

with zipfile.ZipFile(bundle_path, "r") as zf:
    zf.extractall(dest_path)   # no path sanitization

Fix: Validate each entry path before extracting:

resolved_dest = dest_path.resolve()
for member in zf.namelist():
    if not (resolved_dest / member).resolve().is_relative_to(resolved_dest):
        raise ValueError(f"Zip slip attempt detected: {member}")
zf.extractall(dest_path)

DEAD-CODE-1 — Unreachable 404 branch in client.py:get_bundle_download

File: packages/sdk-python/src/mpak/client.py

The explicit if response.status_code == 404 check fires before response.raise_for_status(), so a 404 raises MpakNotFoundError and exits the try block immediately. The except httpx.HTTPStatusError handler's own 404 branch is therefore unreachable dead code.

if response.status_code == 404:
    raise MpakNotFoundError(...)   # caught here, exits try
response.raise_for_status()
...
except httpx.HTTPStatusError as e:
    if e.response.status_code == 404:   # never reached
        raise MpakNotFoundError(...) from e

One of the two should be removed.


DESIGN-1 — Bundle download bypasses the configured client

File: packages/sdk-python/src/mpak/client.py

Registry API calls use self._client (configured with User-Agent, Accept, follow_redirects). The actual bundle download creates a raw ad-hoc stream, silently losing those headers:

with httpx.stream("GET", url, follow_redirects=True, timeout=self.config.timeout) as response:

Fix: Use self._client.stream(...) for consistency.


DESIGN-2 — Version "0.1.0" hardcoded in two places independently

Files: src/mpak/__init__.py, src/mpak/types.py

__version__ in __init__.py and the user_agent default in MpakClientConfig both hardcode "0.1.0" independently of pyproject.toml. When the version is bumped, the user agent will silently report the wrong version.

Fix: Derive it at runtime: importlib.metadata.version("mpak").


Minor issues

  • verify_integrity_bytes in _integrity.py — defined and exported but never called anywhere. Dead code or planned for future use?
  • mock_registry_url fixture in tests/conftest.py — defined but never used in any test file.
  • PRODUCT.md monorepo layout — shows sdk/ (old name) and python-sdk/ (wrong name). Should be sdk-typescript/ and sdk-python/ to match actual directories.
  • pytest-asyncio in dev deps — listed as a dependency and asyncio_mode = "auto" is configured, but no async tests exist.
  • generate-types.py string replacementcontent.replace(" EmailStr,\n", "") hardcodes a 4-space indent assumption; will silently fail if the code generator changes its output format.
  • generate-types.py double fetch — the OpenAPI spec is fetched once via urllib.request.urlopen (to print byte count) and then again by datamodel-codegen --url. The first fetch is redundant.

- BUG-1: Fix NameError in loader.py when MpakClient() raises
- SECURITY-1: Add zip slip protection in load_bundle_from_url
- DEAD-CODE-1: Remove unreachable 404 branch in get_bundle_download
- DESIGN-1: Use self._client.stream() for bundle downloads
- DESIGN-2: Derive version from importlib.metadata
- Remove dead code (verify_integrity_bytes, unused fixture, pytest-asyncio)
- Fix PRODUCT.md and CLAUDE.md monorepo layout (sdk/ -> sdk-typescript/)
- Fix generate-types.py redundant fetch
- Add ty type checker to dev deps
- Add sdk-python-ci.yml (ruff, ty, pytest on PR/push)
- Add sdk-python-publish.yml (trusted publisher, publish on tag)
- Add 8 unit tests covering all fixed bugs (12 -> 20 total)
Copy link
Collaborator

@JoeCardoso13 JoeCardoso13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review

All findings addressed except one minor item. Solid work — the fixes are correct and well-tested.

Status

ID Status
BUG-1 Fixed, with tests (test_loader.py)
SECURITY-1 Fixed, with tests (zip slip rejection + cleanup)
DEAD-CODE-1 Fixed, with tests (500 → MpakError not MpakNotFoundError)
DESIGN-1 Fixed, with tests (User-Agent header verified on download)
DESIGN-2 Fixed (importlib.metadata in both __init__.py and types.py)
MINOR-1 Fixed (dead code removed)
MINOR-2 Fixed (unused fixture removed)
MINOR-3 Fixed (PRODUCT.md paths corrected)
MINOR-4 Fixed (pytest-asyncio removed)
MINOR-5 Not addressed — see below
MINOR-6 Fixed (redundant fetch removed)

MINOR-5 — Brittle EmailStr replacement in generate-types.py (still open)

The hardcoded 4-space indent assumption in the post-processing step is still there:

content = content.replace("    EmailStr,\n", "").replace("EmailStr", "str")

This will silently fail to remove the EmailStr import if datamodel-codegen ever changes its output formatting. Low risk since the generated file is checked in and would be caught on visual inspection, but worth a regex or AST-based replacement at some point.

@mgoldsborough mgoldsborough merged commit ef181c5 into main Feb 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants